CollabRank: Towards a Collaborative Approach to Single-Document Keyphrase Extraction

نویسندگان

  • Xiaojun Wan
  • Jianguo Xiao
چکیده

Previous methods usually conduct the keyphrase extraction task for single documents separately without interactions for each document, under the assumption that the documents are considered independent of each other. This paper proposes a novel approach named CollabRank to collaborative single-document keyphrase extraction by making use of mutual influences of multiple documents within a cluster context. CollabRank is implemented by first employing the clustering algorithm to obtain appropriate document clusters, and then using the graph-based ranking algorithm for collaborative single-document keyphrase extraction within each cluster. Experimental results demonstrate the encouraging performance of the proposed approach. Different clustering algorithms have been investigated and we find that the system performance relies positively on the quality of document clusters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...

متن کامل

Single Document Keyphrase Extraction Using Neighborhood Knowledge

Existing methods for single document keyphrase extraction usually make use of only the information contained in the specified document. This paper proposes to use a small number of nearest neighbor documents to provide more knowledge to improve single document keyphrase extraction. A specified document is expanded to a small document set by adding a few neighbor documents close to the document,...

متن کامل

A New Domain Independent Keyphrase Extraction System

In this paper we present a keyphrase extraction system that can extract potential phrases from a single document in an unsupervised, domain-independent way. We extract word n-grams from input document. We incorporate linguistic knowledge (i.e., part-of-speech tags), and statistical information (i.e., frequency, position, lifespan) of each n-gram in defining candidate phrases and their respectiv...

متن کامل

Collaborative Ranking between Supervised and Unsupervised Approaches for Keyphrase Extraction

Automatic keyphrase extraction methods have generally taken either supervised or unsupervised approaches. Supervised methods extract keyphrases by using a training document set, thus acquiring knowledge from a global collection of texts. Conversely, unsupervised methods extract keyphrases by determining their relevance in a single-document context, without prior learning. We present a hybrid ke...

متن کامل

Local Word Vectors Guiding Keyphrase Extraction

Automated keyphrase extraction is a fundamental textual information processing task concerned with the selection of representative phrases from a document that summarize its content. This work presents a novel unsupervised method for keyphrase extraction, whose main innovation is the use of local word embeddings (in particular GloVe vectors), i.e. embeddings trained from the single document und...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008